Enhancing Usability of Information Extraction Results with Textual Data Profiling

نویسندگان

  • Jyi-Shane Liu
  • Yung-Wei Cheng
چکیده

Given a targeted subject and a text collection, information extraction techniques provide the capability to populate a database in which each record entry is a subject instance documented in the text collection. However, even with the state-of-the-art IE techniques, IE task results are expected to contain errors. Manual error detection and correction are labor intensive and time consuming. This validation cost remains a major obstacle to actual deployment of practical IE applications with high validity requirement. In this paper, we propose a string feature-based approach to textual data profiling and invalid data detection. The approach is based on the observation that values of an attribute in IE results are symbolic form variations of a concept in the IE task subject and may exhibit a certain congruity with some string features. We conducted experiments to verify that effective detection of IE invalid values can be achieved by using the surface-form string features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Relevancy Accessing Linked Opinion Data

We introduce a search engine and information retrieval system for providing access to linked opinion data. Natural language technology of generalization of syntactic parse trees is introduced as a similarity measure between subjects of textual opinions to link them on the fly. Information extraction algorithm for automatic summarization of web pages in the format of Google sponsored links is pr...

متن کامل

Usability of Mobile Website of the Libraries of Top Medical Sciences Universities in Iran

Background and Aim: One of the essential methods of evaluating academic libraries’ mobile websites is the usability analysis method; websites’ usability means their ease and simplicity of use. This study aims to evaluate the usability of the mobile website of the libraries of top medical universities in Iran. This study aims to evaluate the usability of mobile websites of the libraries of top I...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Discovering the Underlying Components Affecting the Usability of IoT in Iranian Libraries: A Theory Based on Context

Objective: The aim is to discover the underlying context components of IOT usability in Iranian libraries: A qualitative approach consistent with grounded theory. Method: This qualitative study was conducted based on grounded theory. Data were collected through semi-structured interviews with 13 faculty members of knowledge and information science based on purposeful and chain methods. Responsi...

متن کامل

Designing and Evaluating an Education-based Follow-up System for Cardiac Patients

Introduction: Chronic diseases are among the most challenging health issues in the world. Although no definitive treatment has been found for such diseases, electronic health strategies can dramatically reduce their complications by enhancing patientschr('39') awareness and monitoring their treatment. The main objective of this study was to design an education-based follow-up system for cardiac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005